<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    
    <title>3.5.2. 动态权重建模 &#8212; FunRec 推荐系统 0.0.1 documentation</title>

    <link rel="stylesheet" href="../../_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/sphinx_materialdesign_theme.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fontawesome/all.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fonts.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/basic.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/d2l.css" />
    <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
    <script src="../../_static/jquery.js"></script>
    <script src="../../_static/underscore.js"></script>
    <script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="../../_static/doctools.js"></script>
    <script src="../../_static/sphinx_highlight.js"></script>
    <script src="../../_static/d2l.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="../../genindex.html" />
    <link rel="search" title="Search" href="../../search.html" />
    <link rel="next" title="4. 重排模型" href="../../chapter_3_rerank/index.html" />
    <link rel="prev" title="3.5.1. 多塔结构" href="1.multi_tower.html" /> 
  </head>
<body>
    <div class="mdl-layout mdl-js-layout mdl-layout--fixed-header mdl-layout--fixed-drawer"><header class="mdl-layout__header mdl-layout__header--waterfall ">
    <div class="mdl-layout__header-row">
        
        <nav class="mdl-navigation breadcrumb">
            <a class="mdl-navigation__link" href="../index.html"><span class="section-number">3. </span>精排模型</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link" href="index.html"><span class="section-number">3.5. </span>多场景建模</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link is-active"><span class="section-number">3.5.2. </span>动态权重建模</a>
        </nav>
        <div class="mdl-layout-spacer"></div>
        <nav class="mdl-navigation">
        
<form class="form-inline pull-sm-right" action="../../search.html" method="get">
      <div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right">
        <label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon"  for="waterfall-exp">
          <i class="material-icons">search</i>
        </label>
        <div class="mdl-textfield__expandable-holder">
          <input class="mdl-textfield__input" type="text" name="q"  id="waterfall-exp" placeholder="Search" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </div>
      </div>
      <div class="mdl-tooltip" data-mdl-for="quick-search-icon">
      Quick search
      </div>
</form>
        
<a id="button-show-source"
    class="mdl-button mdl-js-button mdl-button--icon"
    href="../../_sources/chapter_2_ranking/5.multi_scenario/2.dynamic_weight.rst.txt" rel="nofollow">
  <i class="material-icons">code</i>
</a>
<div class="mdl-tooltip" data-mdl-for="button-show-source">
Show Source
</div>
        </nav>
    </div>
    <div class="mdl-layout__header-row header-links">
      <div class="mdl-layout-spacer"></div>
      <nav class="mdl-navigation">
          
              <a  class="mdl-navigation__link" href="https://funrec-notebooks.s3.eu-west-3.amazonaws.com/fun-rec.zip">
                  <i class="fas fa-download"></i>
                  Jupyter 记事本
              </a>
          
              <a  class="mdl-navigation__link" href="https://github.com/datawhalechina/fun-rec">
                  <i class="fab fa-github"></i>
                  GitHub
              </a>
      </nav>
    </div>
</header><header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">3. 精排模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">3.5. 多场景建模</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>
        <main class="mdl-layout__content" tabIndex="0">

	<script type="text/javascript" src="../../_static/sphinx_materialdesign_theme.js "></script>
    <header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_1_retrieval/index.html">2. 召回模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/index.html">2.3. 序列召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/1.user_interests.html">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_1_retrieval/3.sequence/3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">3. 精排模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">3.5. 多场景建模</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>

    <div class="document">
        <div class="page-content" role="main">
        
  <section id="dynamic-weight">
<span id="id1"></span><h1><span class="section-number">3.5.2. </span>动态权重建模<a class="headerlink" href="#dynamic-weight" title="Permalink to this heading">¶</a></h1>
<p>在前一小节，我们探讨了 HMoE 和 STAR
这类基于多塔结构的多场景建模方案。它们通过为不同场景构建独立的“专家”网络或塔底参数，有效地捕获了场景间的特异性信息，解决了模型在跨场景迁移时因参数冲突导致的性能下降问题。这类模型的核心思想是“分而治之”，通过物理隔离的参数空间来保障场景的独特性。</p>
<p>为了在保持模型参数高效共享的同时，实现更细粒度、更灵活的场景感知能力，研究者们提出了“动态权重建模”的新范式。
这类方法的核心理念不再是构建物理隔离的参数塔，而是让模型的核心网络参数在不同场景下共享一个基础，但通过动态生成的、与场景/样本高度相关的“权重”来调制（Modulate）这些共享参数的行为。这相当于为共享网络“注入”了场景和样本的上下文信息，使其能够根据当前上下文动态调整其计算逻辑。</p>
<p>本节将重点介绍几种具有代表性的动态权重建模方案，它们展示了如何巧妙地设计“权重生成器”来调制共享网络。它们通过引入动态性，在参数效率、灵活性和性能之间取得了更优的平衡，为构建更加智能、自适应的多场景推荐系统提供了有力工具。</p>
<section id="pepnet">
<h2><span class="section-number">3.5.2.1. </span>PEPNET<a class="headerlink" href="#pepnet" title="Permalink to this heading">¶</a></h2>
<p>PEPNet（Parameter and Embedding Personalized Network）
<span id="id2">(<a class="reference internal" href="../../chapter_references/references.html#id61" title="Chang, J., Zhang, C., Hui, Y., Leng, D., Niu, Y., Song, Y., &amp; Gai, K. (2023). Pepnet: parameter and embedding personalized network for infusing with personalized prior information. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 3795–3804).">Chang <em>et al.</em>, 2023</a>)</span>
核心目标是解决多场景多任务中的双重跷跷板效应（Double Seesaw
Phenomenon）。</p>
<ul class="simple">
<li><p>场景跷跷板（Domain
Seesaw）：混合训练时不同场景数据分布差异导致表征无法对齐；</p></li>
<li><p>任务跷跷板（Task
Seesaw）：多任务间稀疏性与依赖关系失衡导致目标相互抑制；</p></li>
</ul>
<p>PEPNet通过两大模块实现动态权重调控，形成“底层场景适配 +
顶层任务适配”的分层个性化，这也是PEPNet实现参数个性化的核心思路，PEPNet模型结构如下：</p>
<figure class="align-default" id="id7">
<span id="pepnet-model-structure"></span><a class="reference internal image-reference" href="../../_images/pepnet.png"><img alt="../../_images/pepnet.png" src="../../_images/pepnet.png" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.5 </span><span class="caption-text">PEPNet模型结构</span><a class="headerlink" href="#id7" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>在介绍PEPNet的量大核心组件之前，需要先简单介绍一个通过模块Gate
NU，EPNet与PPNet均基于轻量级门控单元Gate
NU构建，以极低参数量实现参数个性化，Gate NU受语音识别领域LHUC
<span id="id3">(<a class="reference internal" href="../../chapter_references/references.html#id97" title="Swietojanski, P., Li, J., &amp; Renals, S. (2016). Learning hidden unit contributions for unsupervised acoustic model adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8), 1450–1463.">Swietojanski <em>et al.</em>, 2016</a>)</span>
模型启发，通过两层网络生成动态缩放权重：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-0">
<span class="eqno">(3.5.9)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-0" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
&amp;\mathbf{x'} = \text{ReLU}(\mathbf{x} \mathbf{W_1} + \mathbf{b_1}) \\
&amp;\delta = \gamma \cdot \text{Sigmoid}(\mathbf{x'} \mathbf{W_2} + \mathbf{b_2}) \quad \in [0, \gamma]
\end{aligned}\end{split}\]</div>
<p>其中<span class="math notranslate nohighlight">\(\mathbf{x}\)</span>为个性化先验特征（如场景ID或用户画像），<span class="math notranslate nohighlight">\(\gamma\)</span>为缩放强度（经验值设为2）。输出<span class="math notranslate nohighlight">\(\boldsymbol{\delta}\)</span>与目标参数维度对齐，通过逐元素相乘（<span class="math notranslate nohighlight">\(\otimes\)</span>）实现调制。Gate
NU的代码实现如下：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">GateNU</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">    两层门控网络（NU）：用于为不同分支/专家动态生成权重或系数。</span>
<span class="sd">    结构：Dense(ReLU) -&gt; Dense(Sigmoid) -&gt; gamma 缩放。</span>
<span class="sd">    &quot;&quot;&quot;</span>
    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
                 <span class="n">hidden_units</span><span class="p">,</span>
                 <span class="n">gamma</span><span class="o">=</span><span class="mf">2.</span><span class="p">,</span>
                 <span class="n">l2_reg</span><span class="o">=</span><span class="mf">0.</span><span class="p">):</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">hidden_units</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">gamma</span> <span class="o">=</span> <span class="n">gamma</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">dense_layers</span> <span class="o">=</span> <span class="p">[</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">hidden_units</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;relu&quot;</span><span class="p">,</span> <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">regularizers</span><span class="o">.</span><span class="n">l2</span><span class="p">(</span><span class="n">l2_reg</span><span class="p">)),</span>  <span class="c1"># 第一层：非线性特征提取</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">hidden_units</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="s2">&quot;sigmoid&quot;</span><span class="p">,</span> <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">regularizers</span><span class="o">.</span><span class="n">l2</span><span class="p">(</span><span class="n">l2_reg</span><span class="p">))</span> <span class="c1"># 第二层：输出 (0,1) 门值</span>
        <span class="p">]</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">GateNU</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
        <span class="n">output</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dense_layers</span><span class="p">[</span><span class="mi">0</span><span class="p">](</span><span class="n">inputs</span><span class="p">)</span>  <span class="c1"># [B, hidden_units[0]]</span>
        <span class="c1"># 乘以 gamma 对 Sigmoid 输出进行缩放。</span>
        <span class="n">output</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">gamma</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">dense_layers</span><span class="p">[</span><span class="mi">1</span><span class="p">](</span><span class="n">output</span><span class="p">)</span>  <span class="c1"># [B, hidden_units[1]]</span>
        <span class="k">return</span> <span class="n">output</span>
</pre></div>
</div>
<section id="epnet">
<h3><span class="section-number">3.5.2.1.1. </span>EPNet：场景感知的嵌入个性化<a class="headerlink" href="#epnet" title="Permalink to this heading">¶</a></h3>
<p>在实际的推荐模型中，特征Embedding层的参数量往往是最大的，共享底层Embedding也成为了业界标准的做法。但在多场景建模中，这种底层共享的机制更多的强调的是不同场景之间的共性，忽略了不同场景下Embedding的差异性。为此EPNet在Embedding层的基础上，将场景先验信息通过门控机制的方式以较低的参数量实现Embedding层的场景个性化。</p>
<p>EPNet 中Embedding层的门控单元<span class="math notranslate nohighlight">\(U_{ep}\)</span>，以场景共享Embedding
<span class="math notranslate nohighlight">\(E\)</span>和输入的场景先验特征的Embedding
<span class="math notranslate nohighlight">\(E(\mathcal{F}_d)\)</span>拼接后的结果作为输入，EPNet的场景个性化输出<span class="math notranslate nohighlight">\(\delta_{domain}\)</span>表示如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-1">
<span class="eqno">(3.5.10)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-1" title="Permalink to this equation">¶</a></span>\[\delta_{domain} = \mathcal{U}_{ep}(E(\mathcal{F}_d) \oplus (\oslash(E))),\]</div>
<p>其中，<span class="math notranslate nohighlight">\(U_{ep}\)</span>是EPNet模块的Gate NU网络。</p>
<p>为了让场景感知的个性化模块EPNet不影响底层共享Embedding的学习，在计算个性化门控结果时让共享Embedding层的梯度不反向传播，<span class="math notranslate nohighlight">\(\oslash\)</span>表示的是Stop
Gredient。</p>
<p>然后，通过元素级乘积得到场景个性化的Embedding表征<span class="math notranslate nohighlight">\(O_{ep}\)</span>：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-2">
<span class="eqno">(3.5.11)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-2" title="Permalink to this equation">¶</a></span>\[O_{ep} = \delta_{domain} \otimes E\]</div>
<p>通过将场景个性化先验信息整合到Embedding层中，EPNet可以有效地平衡多场景之间的共性和差异。</p>
<p>EPNet的实现代码如下：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">EPNet</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
                 <span class="n">l2_reg</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span>
                 <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">l2_reg</span> <span class="o">=</span> <span class="n">l2_reg</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span> <span class="o">=</span> <span class="kc">None</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">EPNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">build</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_shape</span><span class="p">):</span>
        <span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">input_shape</span><span class="p">)</span> <span class="o">==</span> <span class="mi">2</span>
        <span class="n">shape1</span><span class="p">,</span> <span class="n">shape2</span> <span class="o">=</span> <span class="n">input_shape</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span> <span class="o">=</span> <span class="n">GateNU</span><span class="p">(</span><span class="n">hidden_units</span><span class="o">=</span><span class="p">[</span><span class="n">shape2</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="n">shape2</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]],</span> <span class="n">l2_reg</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">l2_reg</span><span class="p">)</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">domain</span><span class="p">,</span> <span class="n">emb</span> <span class="o">=</span> <span class="n">inputs</span>
        <span class="c1"># stop_gradient 阻断系数支路对 emb 的反向梯度，避免过度耦合。</span>
        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">domain</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">stop_gradient</span><span class="p">(</span><span class="n">emb</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">))</span> <span class="o">*</span> <span class="n">emb</span>  <span class="c1"># 输出形状 [B, D_emb]</span>
</pre></div>
</div>
</section>
<section id="ppnet">
<h3><span class="section-number">3.5.2.1.2. </span>PPNet：用户感知的参数个性化<a class="headerlink" href="#ppnet" title="Permalink to this heading">¶</a></h3>
<p>EPNet解决的是多场景跷跷板问题，PPNet则更多考虑的是多任务之间的跷跷板。相比MMOE，PLE等其他多任务模型是任务级的个性化而言，PPNet则可以看成是样本粒度的个性化。PEPNet将用户ID、内容ID及作者ID作为个性化的先验特征，同时拼接上述EPNet得到的场景个性化的Embedding
<span class="math notranslate nohighlight">\(O_{ep}\)</span>作为所有任务塔DNN参数个性化门控<span class="math notranslate nohighlight">\(U_{pp}\)</span>的输入。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-3">
<span class="eqno">(3.5.12)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-3" title="Permalink to this equation">¶</a></span>\[\begin{split}O_{prior} = E(F_u) \oplus E(F_i) \oplus E(F_a) \\
\delta_{task}  = \mathcal{U} _ {pp} (O_{prior} \oplus ( \odot (O_{ep} )))\end{split}\]</div>
<p>为了防止PPNet模块影响到EPNet的参数更新，在计算<span class="math notranslate nohighlight">\(\delta_{task}\)</span>时<span class="math notranslate nohighlight">\(O_{ep}\)</span>部分不能梯度回传，<span class="math notranslate nohighlight">\(\delta_{task}\)</span>表示的是用户个性化门控的输出结果。</p>
<p>在得到了用户个性化的门控结果后，将其应用在所有任务塔中每个DNN网络上，从模型的结构图中可以看出不同任务塔中DNN的个性化门控是共享一份的，对于某个task的第<span class="math notranslate nohighlight">\(l\)</span>层DNN网络的参数个性化表达如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-4">
<span class="eqno">(3.5.13)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-4" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\mathbf{O}^{(l)}_{pp} &amp;= \delta^{(l)}_{task} \otimes \mathbf{H}^{(l)} \\
\mathbf{H}^{(l+1)} &amp;= f\left( \mathbf{O}^{(l)}_{pp} \mathbf{W}^{(l)} + \mathbf{b}^{(l)} \right), l \in \{1, \ldots, L\}
\end{aligned}\end{split}\]</div>
<p>其中，<span class="math notranslate nohighlight">\(L\)</span>表示任务塔DNN网络的总层数，<span class="math notranslate nohighlight">\(\mathbf{H}^{(l)}\)</span>表示第<span class="math notranslate nohighlight">\(l\)</span>层DNN的输出同时也是第<span class="math notranslate nohighlight">\(l+1\)</span>层DNN的输入，<span class="math notranslate nohighlight">\(\mathbf{O}^{(l)}_{pp}\)</span>表示的是任务塔中第<span class="math notranslate nohighlight">\(l\)</span>层DNN的输出乘上个性化参数门控<span class="math notranslate nohighlight">\(\delta_{task}\)</span>后的输出结果。</p>
<p>PPNet实现代码如下：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">PPNet</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
    <span class="c1"># 核心：用 persona 生成逐层、逐塔的门控系数，对输出按维度缩放</span>
    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span>
                 <span class="n">multiples</span><span class="p">,</span>
                 <span class="n">hidden_units</span><span class="p">,</span>
                 <span class="n">activation</span><span class="p">,</span>
                 <span class="n">dropout</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span>
                 <span class="n">l2_reg</span><span class="o">=</span><span class="mf">0.</span><span class="p">,</span>
                 <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">hidden_units</span> <span class="o">=</span> <span class="n">hidden_units</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">l2_reg</span> <span class="o">=</span> <span class="n">l2_reg</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">multiples</span> <span class="o">=</span> <span class="n">multiples</span>
        <span class="c1"># 每个塔一份同结构的层</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">dense_layers</span> <span class="o">=</span> <span class="p">[</span>
            <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">u</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="n">activation</span><span class="p">,</span>
                                   <span class="n">kernel_regularizer</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">regularizers</span><span class="o">.</span><span class="n">l2</span><span class="p">(</span><span class="n">l2_reg</span><span class="p">))</span>
             <span class="k">for</span> <span class="n">u</span> <span class="ow">in</span> <span class="n">hidden_units</span><span class="p">]</span>
            <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">multiples</span><span class="p">)</span>
        <span class="p">]</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">dropout_layers</span> <span class="o">=</span> <span class="p">[</span>
            <span class="p">[</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">hidden_units</span><span class="p">]</span>
            <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">multiples</span><span class="p">)</span>
        <span class="p">]</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">PPNet</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">build</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_shape</span><span class="p">):</span>
        <span class="c1"># 为每层生成 gate：输出维度 units*multiples，后续按塔切片</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span> <span class="o">=</span> <span class="p">[</span>
            <span class="n">GateNU</span><span class="p">([</span><span class="n">units</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">multiples</span><span class="p">,</span> <span class="n">units</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">multiples</span><span class="p">],</span> <span class="n">l2_reg</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">l2_reg</span><span class="p">)</span>
            <span class="k">for</span> <span class="n">units</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_units</span>
        <span class="p">]</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="n">inputs</span><span class="p">,</span> <span class="n">persona</span> <span class="o">=</span> <span class="n">inputs</span>

        <span class="c1"># 先计算各层 gate（persona ⊕ stop_gradient(inputs)）</span>
        <span class="n">gate_list</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="n">concat_in</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">persona</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">stop_gradient</span><span class="p">(</span><span class="n">inputs</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">gate</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">gate_nu</span><span class="p">):</span>
            <span class="n">g</span> <span class="o">=</span> <span class="n">gate</span><span class="p">(</span><span class="n">concat_in</span><span class="p">)</span>                     <span class="c1"># [B, units*multiples]</span>
            <span class="n">g</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">g</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">multiples</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># 每塔 [B, units]</span>
            <span class="n">gate_list</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">g</span><span class="p">)</span>

        <span class="c1"># 按塔前向：逐层 Dense 后用 gate 做逐维调制</span>
        <span class="n">outputs</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">multiples</span><span class="p">):</span>
            <span class="n">x</span> <span class="o">=</span> <span class="n">inputs</span>
            <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">hidden_units</span><span class="p">)):</span>
                <span class="n">x</span> <span class="o">=</span> <span class="n">gate_list</span><span class="p">[</span><span class="n">i</span><span class="p">][</span><span class="n">n</span><span class="p">]</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">dense_layers</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">i</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
                <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout_layers</span><span class="p">[</span><span class="n">n</span><span class="p">][</span><span class="n">i</span><span class="p">](</span><span class="n">x</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="n">training</span><span class="p">)</span>
            <span class="n">outputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">outputs</span>
</pre></div>
</div>
<p><strong>代码实践</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">funrec</span><span class="w"> </span><span class="kn">import</span> <span class="n">run_experiment</span>

<span class="n">run_experiment</span><span class="p">(</span><span class="s1">&#39;mmoe&#39;</span><span class="p">)</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+----------------+---------------+-----------------+-------------+-----------------+----------------+------------------+--------------+---------------------+--------------------+----------------------+</span>
<span class="o">|</span>   <span class="n">auc_is_click</span> <span class="o">|</span>   <span class="n">auc_is_like</span> <span class="o">|</span>   <span class="n">auc_long_view</span> <span class="o">|</span>   <span class="n">auc_macro</span> <span class="o">|</span>   <span class="n">gauc_is_click</span> <span class="o">|</span>   <span class="n">gauc_is_like</span> <span class="o">|</span>   <span class="n">gauc_long_view</span> <span class="o">|</span>   <span class="n">gauc_macro</span> <span class="o">|</span>   <span class="n">val_user_is_click</span> <span class="o">|</span>   <span class="n">val_user_is_like</span> <span class="o">|</span>   <span class="n">val_user_long_view</span> <span class="o">|</span>
<span class="o">+================+===============+=================+=============+=================+================+==================+==============+=====================+====================+======================+</span>
<span class="o">|</span>         <span class="mf">0.6034</span> <span class="o">|</span>        <span class="mf">0.4272</span> <span class="o">|</span>          <span class="mf">0.4362</span> <span class="o">|</span>      <span class="mf">0.4889</span> <span class="o">|</span>          <span class="mf">0.5756</span> <span class="o">|</span>         <span class="mf">0.4461</span> <span class="o">|</span>           <span class="mf">0.4511</span> <span class="o">|</span>       <span class="mf">0.4909</span> <span class="o">|</span>                 <span class="mi">928</span> <span class="o">|</span>                <span class="mi">530</span> <span class="o">|</span>                  <span class="mi">925</span> <span class="o">|</span>
<span class="o">+----------------+---------------+-----------------+-------------+-----------------+----------------+------------------+--------------+---------------------+--------------------+----------------------+</span>
</pre></div>
</div>
</section>
</section>
<section id="apg">
<h2><span class="section-number">3.5.2.2. </span>APG<a class="headerlink" href="#apg" title="Permalink to this heading">¶</a></h2>
<p>在上一小节中，介绍了PEPNet通过场景/个性化的先验信号作为门控网络的输入，然后将门控网络的输出作用在底层Embedding和多目标塔的DNN层上，分别实现了基于门控的场景和多任务塔参数的个性化。本节将要介绍的APG
（Adaptive Parameter Generation）:cite:<cite>yan2022apg</cite>
模型同样希望实现样本粒度的个性化，但是做法却与PEPNet不太相同。APG
的核心思想是根据样本的不同动态生成相应的参数，从而提升模型的容量和表达能力。</p>
<p>APG通过<strong>样本感知的输入</strong>来生成自适应参数，这种方式可以应用在大多数的混合样本分布建模的问题中，多场景建模也不例外。具体来说，它提出了三种策略来生成样本的条件表示</p>
<ol class="arabic simple">
<li><p>Group-wise策略，适用于样本可以被分组成不同Group的情况，同一Group内的样本具有相似的模式，此时可以将Group相关特征作为输入</p></li>
<li><p>Mix-wise策略，将多种因素考虑进来生成，能够实现更细粒度的样本组划分，甚至可以做到
“千样本千模型”，如将&lt;user, item&gt; pair向量作为输入。</p></li>
<li><p>Self-wise策略，不需要先验知识的输入，直接将Deep CTR
Models的隐层输出作为参数生成的输入。</p></li>
</ol>
<p>在APG中通过MLP来自适应生成参数，将需要感知样本的输入<span class="math notranslate nohighlight">\(\mathbf{z_i}\)</span>输入到MLP中，然后Reshape成一个矩阵，</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-5">
<span class="eqno">(3.5.14)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-5" title="Permalink to this equation">¶</a></span>\[\mathbf{W}_i = \text{reshape}(\text{MLP}( \mathbf{z}_i ))\]</div>
<p>生成的参数矩阵，等价于MLP网络中的参数矩阵，通过矩阵乘法和激活函数实现和MLP一样的功能，如点击率模型的预估可以表示如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-6">
<span class="eqno">(3.5.15)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-6" title="Permalink to this equation">¶</a></span>\[y_i = \sigma(\mathbf{W}_i \mathbf{x}_i)\]</div>
<p>APG的核心思想比较简单，但要实际的上生产还需要经过一些优化。</p>
<p><strong>低秩参数优化</strong>：借鉴低秩相关方法，APG假设自适应参数存在低秩关系，将参数矩阵分解成三个子矩阵相乘的形式。通过设置较小的秩值，可以有效控制计算和存储开销，同时在需要时也可以通过增大秩值来增加参数空间，参数分解表达式如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-7">
<span class="eqno">(3.5.16)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-7" title="Permalink to this equation">¶</a></span>\[\mathbf{U}_i, \mathbf{S}_i, \mathbf{V}_i = \text{reshape}(\text{MLP}( \mathbf{z}_i ))\]</div>
<figure class="align-default" id="id8">
<span id="apg-parameter-decomposition"></span><a class="reference internal image-reference" href="../../_images/apg_1.png"><img alt="../../_images/apg_1.png" src="../../_images/apg_1.png" style="width: 300px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.6 </span><span class="caption-text">参数分解</span><a class="headerlink" href="#id8" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p><strong>分解前向计算</strong>：在低秩参数化的基础上，APG设计了一种分解前向计算的方式，让输入依次乘以各个子矩阵。这种方式避免了计算开销较大的子矩阵乘操作，降低了整体的计算复杂度，</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-8">
<span class="eqno">(3.5.17)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-8" title="Permalink to this equation">¶</a></span>\[y_i = \sigma(\mathbf{W}_i \mathbf{x}_i) = \sigma((\mathbf{U}_i \mathbf{S}_i \mathbf{V}_i) \mathbf{x}_i) = \sigma(\mathbf{U}_i (\mathbf{S}_i (\mathbf{V}_i \mathbf{x}_i)))\]</div>
<figure class="align-default" id="id9">
<span id="apg-forward-computation-optimization"></span><a class="reference internal image-reference" href="../../_images/apg_2.png"><img alt="../../_images/apg_2.png" src="../../_images/apg_2.png" style="width: 300px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.7 </span><span class="caption-text">前向计算优化</span><a class="headerlink" href="#id9" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p><strong>参数共享和过参数化</strong>：得益于矩阵分解带来的灵活性，APG将参数矩阵分为私有参数和共享参数两类。私有参数用于刻画不同样本的特性，共享参数则用于刻画样本共性。通过这种划分，APG在生成自适应参数的同时，也保留了对样本共性的表达，丰富了模型的表达能力。而且，由于私有参数规模的缩减，整体的计算和存储开销也得到了降低。此外，APG
将共享参数替代为两个大矩阵，这种设计不仅可以增加参数数量来提升模型容量，还具有隐含的正则效果，有助于防止过拟合。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-9">
<span class="eqno">(3.5.18)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-9" title="Permalink to this equation">¶</a></span>\[\begin{split}\mathbf{S}_{i}=\text{reshape}(\text{MLP}(\mathbf{z}_{i})) \\
\mathbf{U} = \mathbf{U}^l \mathbf{U}^r, \mathbf{V} = \mathbf{V}^l \mathbf{V}^r \\
y_{i}=\sigma(\mathbf{U}(\mathbf{S}_{i}\mathbf{V}\mathbf{x}_{i}))\end{split}\]</div>
<figure class="align-default" id="id10">
<span id="apg-parameter-sharing-and-overparameterization"></span><a class="reference internal image-reference" href="../../_images/apg_3.png"><img alt="../../_images/apg_3.png" src="../../_images/apg_3.png" style="width: 300px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.8 </span><span class="caption-text">参数共享及过参数化</span><a class="headerlink" href="#id10" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>APG Layer核心代码：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">APGLayer</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">    APG 自适应参数生成层（简化版）</span>
<span class="sd">    核心：共享 U/V（输入→K、K→输出） + 样本私有 S_i（K×K）。</span>
<span class="sd">    用场景/样本嵌入逐样本生成 S_i，对 K 表示进行调制，体现“按样本生成参数”的低秩形式。</span>
<span class="sd">    仅保留 K 路：输入到 K（共享）→ K×K 调制（私有）→ K 到输出（共享）。</span>
<span class="sd">    &quot;&quot;&quot;</span>
    <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_dim</span><span class="p">,</span> <span class="n">output_dim</span><span class="p">,</span> <span class="n">scene_emb_dim</span><span class="p">,</span>
                 <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;relu&#39;</span><span class="p">,</span> <span class="n">generate_activation</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
                 <span class="n">inner_activation</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">mf_k</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span><span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">APGLayer</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">input_dim</span> <span class="o">=</span> <span class="n">input_dim</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">output_dim</span> <span class="o">=</span> <span class="n">output_dim</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">scene_emb_dim</span> <span class="o">=</span> <span class="n">scene_emb_dim</span>

        <span class="c1"># 激活函数（可选）</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">activation</span> <span class="o">=</span> <span class="n">get_activation</span><span class="p">(</span><span class="n">activation</span><span class="p">)</span> <span class="k">if</span> <span class="n">activation</span> <span class="k">else</span> <span class="kc">None</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">inner_activation</span> <span class="o">=</span> <span class="n">get_activation</span><span class="p">(</span><span class="n">inner_activation</span><span class="p">)</span> <span class="k">if</span> <span class="n">inner_activation</span> <span class="k">else</span> <span class="kc">None</span>

        <span class="c1"># 低秩维度 K：取输入/输出较小维度并缩减，强调“低秩”</span>
        <span class="n">min_dim</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="n">input_dim</span><span class="p">,</span> <span class="n">output_dim</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">math</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">min_dim</span> <span class="o">/</span> <span class="n">mf_k</span><span class="p">))</span>  <span class="c1"># K ≪ min(N, M)</span>

        <span class="c1"># 私有因子 S_i：从场景嵌入生成逐样本的 K×K 权重与 K 偏置</span>
        <span class="n">kk_weight_size</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span>  <span class="c1"># K×K</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">specific_weight_kk</span> <span class="o">=</span> <span class="n">DNNs</span><span class="p">([</span><span class="n">kk_weight_size</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="n">generate_activation</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">specific_bias_kk</span> <span class="o">=</span> <span class="n">DNNs</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">],</span> <span class="n">activation</span><span class="o">=</span><span class="n">generate_activation</span><span class="p">)</span>

        <span class="c1"># 共享因子 U/V：固定的 NK（输入→K）与 KM（K→输出）权重与偏置</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">shared_weight_nk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">input_dim</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;shared_weight_nk&#39;</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">shared_bias_nk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">,),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;zeros&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;shared_bias_nk&#39;</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">shared_weight_km</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">,</span> <span class="n">output_dim</span><span class="p">),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;shared_weight_km&#39;</span><span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">shared_bias_km</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">output_dim</span><span class="p">,),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;zeros&#39;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;shared_bias_km&#39;</span><span class="p">)</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
<span class="w">        </span><span class="sd">&quot;&quot;&quot;</span>
<span class="sd">        前向（分解前向）：</span>
<span class="sd">        x → NK（共享U）→ K → S_i（样本私有KK调制）→ K_mod → KM（共享V）→ 输出 → activation</span>
<span class="sd">        &quot;&quot;&quot;</span>
        <span class="n">x</span><span class="p">,</span> <span class="n">scene_emb</span> <span class="o">=</span> <span class="n">inputs</span>

        <span class="c1"># 生成样本私有 KK 权重/偏置（S_i）</span>
        <span class="n">specific_weight_kk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">specific_weight_kk</span><span class="p">(</span><span class="n">scene_emb</span><span class="p">)</span>              <span class="c1"># [B, K*K]</span>
        <span class="n">specific_weight_kk</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">specific_weight_kk</span><span class="p">,</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">k_dim</span><span class="p">))</span>  <span class="c1"># [B, K, K]</span>
        <span class="n">specific_bias_kk</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">specific_bias_kk</span><span class="p">(</span><span class="n">scene_emb</span><span class="p">)</span>                  <span class="c1"># [B, K]</span>

        <span class="c1"># NK：输入到 K（共享 U）</span>
        <span class="n">k</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">shared_weight_nk</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">shared_bias_nk</span>        <span class="c1"># [B, K]</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">inner_activation</span><span class="p">:</span>
            <span class="n">k</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">inner_activation</span><span class="p">(</span><span class="n">k</span><span class="p">)</span>

        <span class="c1"># KK：样本私有调制（批量矩阵乘）：[B,1,K] × [B,K,K] → [B,1,K]</span>
        <span class="n">k_mod</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">specific_weight_kk</span><span class="p">)</span>          <span class="c1"># [B, 1, K]</span>
        <span class="n">k_mod</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">k_mod</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">specific_bias_kk</span>                      <span class="c1"># [B, K]</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">inner_activation</span><span class="p">:</span>
            <span class="n">k_mod</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">inner_activation</span><span class="p">(</span><span class="n">k_mod</span><span class="p">)</span>

        <span class="c1"># KM：K 到输出（共享 V）</span>
        <span class="n">output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">k_mod</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">shared_weight_km</span><span class="p">)</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">shared_bias_km</span>

        <span class="c1"># 最终激活（可选）</span>
        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">activation</span><span class="p">:</span>
            <span class="n">output</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">activation</span><span class="p">(</span><span class="n">output</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">output</span>
</pre></div>
</div>
<p><strong>代码实践</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">run_experiment</span><span class="p">(</span><span class="s1">&#39;apg&#39;</span><span class="p">)</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+--------+--------+------------+</span>
<span class="o">|</span>    <span class="n">auc</span> <span class="o">|</span>   <span class="n">gauc</span> <span class="o">|</span>   <span class="n">val_user</span> <span class="o">|</span>
<span class="o">+========+========+============+</span>
<span class="o">|</span> <span class="mf">0.6739</span> <span class="o">|</span> <span class="mf">0.6379</span> <span class="o">|</span>        <span class="mi">217</span> <span class="o">|</span>
<span class="o">+--------+--------+------------+</span>
</pre></div>
</div>
</section>
<section id="m2m">
<h2><span class="section-number">3.5.2.3. </span>M2M<a class="headerlink" href="#m2m" title="Permalink to this heading">¶</a></h2>
<p>在推荐系统的多场景建模中，动态参数生成技术展现出了巨大的潜力。除了 APG
模型外，基于元学习的多场景多任务商家建模（M2M）
<span id="id4">(<a class="reference internal" href="../../chapter_references/references.html#id99" title="Zhang, Q., Liao, X., Liu, Q., Xu, J., &amp; Zheng, B. (2022). Leaving no one behind: a multi-scenario multi-task meta learning approach for advertiser modeling. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining (pp. 1368–1376).">Zhang <em>et al.</em>, 2022</a>)</span>
是另一个在该领域具有重要影响的模型。M2M模型结构中主要包含底层的主干网络和顶层的元学习网络，下面将分别展开详细的介绍。</p>
<figure class="align-default" id="id11">
<span id="m2m-model-structure"></span><a class="reference internal image-reference" href="../../_images/m2m.png"><img alt="../../_images/m2m.png" src="../../_images/m2m.png" style="width: 500px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.9 </span><span class="caption-text">M2M模型结构</span><a class="headerlink" href="#id11" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<section id="id5">
<h3><span class="section-number">3.5.2.3.1. </span>主干网络<a class="headerlink" href="#id5" title="Permalink to this heading">¶</a></h3>
<p>主干网络中包括三部分内容：专家信息表征<span class="math notranslate nohighlight">\(E_i\)</span>，任务信息表征<span class="math notranslate nohighlight">\(T_t\)</span>，场景信息表征<span class="math notranslate nohighlight">\(\tilde{\mathbf{S}}\)</span></p>
<p><strong>专家信息表征</strong>：主干网络中有一个多专家的网络结构，每个专家输入的特征是将序列特征和其他特征拼接后的结果，而序列特征都由多头注意力机制进行聚合，第<span class="math notranslate nohighlight">\(i\)</span>个专家的数学表示为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-10">
<span class="eqno">(3.5.19)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-10" title="Permalink to this equation">¶</a></span>\[\begin{aligned}
\mathbf{E}_i = f_{MLP}(\text{Concat}(X_{seq}, X_{other}))
\end{aligned}\]</div>
<p>其中，<span class="math notranslate nohighlight">\(X_{seq},X_{other}\)</span>分别表示序列特征和除序列特征以外的其他特征，其中序列特征使用多头注意力网络进行聚合，其他特征Embedding直接进行拼接。<span class="math notranslate nohighlight">\(f_{MLP}\)</span>表示MLP网络，<span class="math notranslate nohighlight">\(MAH\)</span>表示多头注意力网络。</p>
<p><strong>任务信息表征</strong>:
为了更好的表达不同任务的差异性，M2M将不同类别的任务进行全局表征，也就是对于每一条样本都会有对应多个任务表征特征。第t个任务对应的特征表征的数学形式：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-11">
<span class="eqno">(3.5.20)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-11" title="Permalink to this equation">¶</a></span>\[\mathbf{T}_t = f_{MLP}(\text{Embedding}(t))\]</div>
<p><strong>场景信息表征</strong>：与任务信息表征类似，为了更好的表达场景之间的差异性，通过MLP网络对场景信息进行单独的表征，其输入的信息不仅包括了直接与场景相关的特征<span class="math notranslate nohighlight">\(S\)</span>还有跟广告主相关的特征<span class="math notranslate nohighlight">\(A\)</span>，场景信息表征<span class="math notranslate nohighlight">\(\tilde{\mathbf{S}}\)</span>的数学形式为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-12">
<span class="eqno">(3.5.21)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-12" title="Permalink to this equation">¶</a></span>\[\tilde{\mathbf{S}}  = f_{MLP}(\mathbf{S}, \mathbf{A})\]</div>
<p>与任务信息表征不同的是，每条样本都有所属的场景，所以场景表达并不是全局的，此外由于该方法最早提出来是为了解决广告业务的，在实际的应用场景中，我们也可以将广告主相关的特征替换成我们业务中更合适的特征。</p>
</section>
<section id="id6">
<h3><span class="section-number">3.5.2.3.2. </span>元学习网络<a class="headerlink" href="#id6" title="Permalink to this heading">¶</a></h3>
<p>在传统的机器学习中，权重<span class="math notranslate nohighlight">\((W, b)\)</span>通过反向传播在固定数据集上优化，学习目标是任务本身的表示。在M2M模型的元学习网络中，用一个MLP（元学习器）根据输入特征动态生成另一个网络（任务模型）的权重<span class="math notranslate nohighlight">\((W, b)\)</span>。这相当于让MLP学会“如何针对不同输入特征生成合适的任务模型参数”，而非直接学习任务本身，使任务模型动态适应不同任务/输入分布，这正是元学习的核心目标。</p>
<figure class="align-default" id="id12">
<span id="m2m-meta-unit-structure"></span><a class="reference internal image-reference" href="../../_images/m2m_meta_unit.png"><img alt="../../_images/m2m_meta_unit.png" src="../../_images/m2m_meta_unit.png" style="width: 800px;" /></a>
<figcaption>
<p><span class="caption-number">图3.5.10 </span><span class="caption-text">Meta Unit网络结构</span><a class="headerlink" href="#id12" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p><strong>元学习单元原理</strong>
在M2M中元学习单元用来显式建模场景信息，为了更好的捕捉动态的场景相关信息，元学习单元将上述主干网络中得到的场景信息表征<span class="math notranslate nohighlight">\(\tilde{\mathbf{S}}\)</span>作为元学习单元的输入，元学习单元通过通过MLP网络将<span class="math notranslate nohighlight">\(\tilde{\mathbf{S}}\)</span>转换成每个场景动态的网络参数<span class="math notranslate nohighlight">\((W,b)\)</span>，然后再将生成的参数作用于输入的特征上，下面<span class="math notranslate nohighlight">\(Meta\)</span>函数中包含了完整的元学习处理过程。</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-13">
<span class="eqno">(3.5.22)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-13" title="Permalink to this equation">¶</a></span>\[h^{output} = \text{Meta}(h^{input}),\]</div>
<p>其中，元学习单元处理的过程如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-14">
<span class="eqno">(3.5.23)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-14" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\text{输入：} &amp; \quad \text{场景信息表征 } \tilde{\mathbf{S}}, \text{ 输入特征 } \mathbf{h}_{\text{input}} \\
\text{输出：} &amp; \quad \text{输出特征 } \mathbf{h}^{\text{output}} \\
\text{步骤：} \\
&amp; 1. \quad \text{初始化：} \\
&amp; \quad \quad \mathbf{h}^{(0)} = \mathbf{h}_{\text{input}} \\
&amp; 2. \quad \text{动态参数生成：} \\
&amp; \quad \quad \text{对于每一层元学习处理（} i \text{ 从 } 1 \text{ 到 } K \text{）：} \\
&amp; \quad \quad \quad \mathbf{W}^{(i-1)} = \text{Reshape}(\mathbf{V}_w \tilde{\mathbf{S}} + \mathbf{v}_w) \\
&amp; \quad \quad \quad \mathbf{b}^{(i-1)} = \text{Reshape}(\mathbf{V}_b \tilde{\mathbf{S}} + \mathbf{v}_b) \\
&amp; 3. \quad \text{元学习处理：} \\
&amp; \quad \quad \text{对于每一层元学习处理（} i \text{ 从 } 1 \text{ 到 } K \text{）：} \\
&amp; \quad \quad \quad \mathbf{h}^{(i)} = \sigma(\mathbf{W}^{(i-1)} \mathbf{h}^{(i-1)} + \mathbf{b}^{(i-1)}) \\
&amp; 4. \quad \text{输出：} \\
&amp; \quad \quad \mathbf{h}^{\text{output}} = \mathbf{h}^{(K)}
\end{aligned}\end{split}\]</div>
<p>元学习单元在后续多专家融合、多任务塔的建模中都有使用，为了方便理解，我们可以将经过元学习单元处理过的特征看成是，特征处理时注入了场景信息的一种类似MLP的通用结构。</p>
<p><strong>Attention元网络</strong></p>
<p>传统的多专家融合方式是将样本中的部分特征输入到门控网络中得到多个专家的融合参数，这种方式在模型训练的过程中，门控网络可以学习到任务与专家之间的关系，却忽略了场景的因素。为此，Attention网络在计算融合权重时引入场景信息，实现了不同场景的融合参数是更个性化的，权重系数的计算如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-15">
<span class="eqno">(3.5.24)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-15" title="Permalink to this equation">¶</a></span>\[\begin{split}a_{t_i} = \mathbf{v}^T \text{Meta}_t([\mathbf{E}_i \parallel \mathbf{T}_t]) \\
\alpha_{t_i} = \frac{\exp(a_{t_i})}{\sum_{j=1}^M \exp(a_{t_j})} \\
\mathbf{R}_t = \sum_{i=1}^k \alpha_{t_i} \mathbf{E}_{t_i}\end{split}\]</div>
<p>其中，<span class="math notranslate nohighlight">\(\mathbf{R}_t\)</span>是任务<span class="math notranslate nohighlight">\(t\)</span>融合多专家后的表征，<span class="math notranslate nohighlight">\(E_i,T_t\)</span>分别是第<span class="math notranslate nohighlight">\(i\)</span>个专家的特征和任务<span class="math notranslate nohighlight">\(t\)</span>的任务信息表征。</p>
<p><strong>Tower元网络</strong>
为了进一步增强场景信息的表征能力，和Attention元网络类似，在多任务塔输出时也引入了元学习单元，并且通过残差的方式实现，Tower元网络的数学形式如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-16">
<span class="eqno">(3.5.25)<a class="headerlink" href="#equation-chapter-2-ranking-5-multi-scenario-2-dynamic-weight-16" title="Permalink to this equation">¶</a></span>\[\begin{split}\mathbf{L}_t^{(0)} = \mathbf{R}_t, \\
\mathbf{L}_t^{(j)} = \sigma( \text{Meta}^{(j-1)}( \mathbf{L}_t^{(j-1)} ) + \mathbf{L}_t^{(j-1)} ), \quad \forall j \in 1, 2, \ldots, L\end{split}\]</div>
<p><strong>代码实践</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">run_experiment</span><span class="p">(</span><span class="s1">&#39;m2m&#39;</span><span class="p">)</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+----------------+---------------+-----------------+-------------+-----------------+----------------+------------------+--------------+---------------------+--------------------+----------------------+</span>
<span class="o">|</span>   <span class="n">auc_is_click</span> <span class="o">|</span>   <span class="n">auc_is_like</span> <span class="o">|</span>   <span class="n">auc_long_view</span> <span class="o">|</span>   <span class="n">auc_macro</span> <span class="o">|</span>   <span class="n">gauc_is_click</span> <span class="o">|</span>   <span class="n">gauc_is_like</span> <span class="o">|</span>   <span class="n">gauc_long_view</span> <span class="o">|</span>   <span class="n">gauc_macro</span> <span class="o">|</span>   <span class="n">val_user_is_click</span> <span class="o">|</span>   <span class="n">val_user_is_like</span> <span class="o">|</span>   <span class="n">val_user_long_view</span> <span class="o">|</span>
<span class="o">+================+===============+=================+=============+=================+================+==================+==============+=====================+====================+======================+</span>
<span class="o">|</span>         <span class="mf">0.6824</span> <span class="o">|</span>        <span class="mf">0.5682</span> <span class="o">|</span>          <span class="mf">0.7021</span> <span class="o">|</span>      <span class="mf">0.6509</span> <span class="o">|</span>          <span class="mf">0.6394</span> <span class="o">|</span>         <span class="mf">0.6458</span> <span class="o">|</span>           <span class="mf">0.6503</span> <span class="o">|</span>       <span class="mf">0.6451</span> <span class="o">|</span>                 <span class="mi">217</span> <span class="o">|</span>                <span class="mi">131</span> <span class="o">|</span>                  <span class="mi">217</span> <span class="o">|</span>
<span class="o">+----------------+---------------+-----------------+-------------+-----------------+----------------+------------------+--------------+---------------------+--------------------+----------------------+</span>
</pre></div>
</div>
</section>
</section>
</section>


        </div>
        <div class="side-doc-outline">
            <div class="side-doc-outline--content"> 
<div class="localtoc">
    <p class="caption">
      <span class="caption-text">Table Of Contents</span>
    </p>
    <ul>
<li><a class="reference internal" href="#">3.5.2. 动态权重建模</a><ul>
<li><a class="reference internal" href="#pepnet">3.5.2.1. PEPNET</a><ul>
<li><a class="reference internal" href="#epnet">3.5.2.1.1. EPNet：场景感知的嵌入个性化</a></li>
<li><a class="reference internal" href="#ppnet">3.5.2.1.2. PPNet：用户感知的参数个性化</a></li>
</ul>
</li>
<li><a class="reference internal" href="#apg">3.5.2.2. APG</a></li>
<li><a class="reference internal" href="#m2m">3.5.2.3. M2M</a><ul>
<li><a class="reference internal" href="#id5">3.5.2.3.1. 主干网络</a></li>
<li><a class="reference internal" href="#id6">3.5.2.3.2. 元学习网络</a></li>
</ul>
</li>
</ul>
</li>
</ul>

</div>
            </div>
        </div>

      <div class="clearer"></div>
    </div><div class="pagenation">
     <a id="button-prev" href="1.multi_tower.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="P">
         <i class="pagenation-arrow-L fas fa-arrow-left fa-lg"></i>
         <div class="pagenation-text">
            <span class="pagenation-direction">Previous</span>
            <div>3.5.1. 多塔结构</div>
         </div>
     </a>
     <a id="button-next" href="../../chapter_3_rerank/index.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="N">
         <i class="pagenation-arrow-R fas fa-arrow-right fa-lg"></i>
        <div class="pagenation-text">
            <span class="pagenation-direction">Next</span>
            <div>4. 重排模型</div>
        </div>
     </a>
  </div>
        
        </main>
    </div>
  </body>
</html>